Virtual Memory
Introduction Rather than have process access memory locations directly, virtual memory is to provide a level of abstraction for processes, this abstraction removes many limits imposed by physical properties of the target machine. The first being code needs to be relocatable in memory, it should not matter where the program or its data lies in memory. Virtual memory supports this by giving each application its own logical address space to work in, which will remain valid despite where it is located in physical memory. Virtual memory also gives users of the system that there exists more physical memory than there really is. This is achieve by providing a large virtual address space and swapping in and out data from disk into memory when it is being used. Virtual Addresses Modern operating systems provide a virtual address space for processes to exist in. Generally the virtual address space size is determined by the length of the instruction word, ie 32bit CPU has 4gigs of virtual address space. Paging The virtual address space is then fixed-partitioned into equal size partitions, each partition is known as a page. In a virtual memory system memory is allocated in pages, all of a fixed length. In practice page sizes range from 512bytes to 4k and even higher depending upon the application. Likewise physical memory is divided into pages of the same size, they are called frames, and a mapping is developed at run time between virtual pages and physical frames. This allows processes to always see the same virtual addresses but the system to place the pages anywhere it pleases in physical memory, all that has to be adjusted is the mapping between virtual and physical page numbers. This operation is handled by a page table which holds the mapping information. Virtual addresses now consist of a page number + offset. The offset does not require translation as it is the same in virtual and physical address space. Page Tables Page tables contain the translation from virtual address to physical address. Page table is generally an array indexed by page number and each entry contains a frame number. Page table entry *Present/Absent bit Also called valid bit, it indicates a valid mapping for the page *Modified bit Also called dirty bit, it indicates the page may have been modified in memory *Reference bit Indicates the page has been accessed *Protection bits Read permission, Write permission, Execute permission Or combinations of the above *Caching bit Use to indicate processor should bypass the cache when accessing memory *Page Frame Number Two Level Page Table http://upload.wikimedia.org/wikipedia/commons/8/8e/X86_Paging_4K.svg Rather than keep a page table which holds a location for every virtual address, which in 64bit address space starts to get quite large, we can use some of the page tables own magic on itself. We split the page table into two levels. A top global table and a number of smaller tables each which cover a section of the virual address space. We now use the first half of the virtual address as an index on the top table and the second used as an index in the second tier tables. The global table contains a reference to which second level table the translation exists in, which are allocated on use. If a second table hasn't been allocated if points to null. The second table is then pointed to and the second half the virtual address is used as an index in the second table. The second table then contains the translation information. The two level page table takes advantage of the fact that process will generally use the bottom of memory for code and data and the top for the stack. Leaving alot of addresses unused. So rather than allocate tables for addresses that aren't used we don't. Inverted Page Table The inverter page table turns a normal page table on its head. Rather than index the table by its page number the page table is indexed by the frame number. To assist in finding the locations of the page number another table, a hash table is used. Virtual addresses are then hashed, which reveals a location in the hash table. The entry into the hash table then contains a frame number which is followed to the inverted page table. If the inverter page table entries page number matches the one hashed the index is the frame number. Otherwise a hash collision has occurred and a link in the inverted table to another entry is followed. This process is continued until a match for the page number is found, the index of which is the frame number. The inverted page table has a number of advantages, the first that the inverter page table only needs to be the size of physical memory, which is generally smaller than the virtual address space. Although IPT generally struggle sharing memory between multiple processes which is something easily achieve with a normal page table. Translation Lookaside Buffer Instead of looking up translations every time memory is accessed, which would slow the memory access significantly, a cache of recently used addresses is kept in a hardware structure known as the TLB. The TLB contains the page number address space id then a corresponding translation with the frame number and number of bits, N,D,V,G. N = non cachable D = dirty ie write protect V = Fuck You G = global translation ie ignore asid. Usually containing 64 translations the page table only needs to be accessed when the TLB faults, ie with the vaild bit is zero. Translations from the pagetable can then be processed and loaded back into the TLB. Fetch Policy Demand Paging With virtual memory and swapping available only sections of the running process need to lie in memory and when other sections are required they can simply be paged in to meet the demand. This allows less memory to be used for the same process, effectively increasing the capacity of the system. To ensure that paging memory in and out of the system is performed effectively a number of properties must be observed. Pre-paging Bring more pages into memory than currently required, hoping that they will be used in future. It improves I/O preformance by reading large chunks, it can use idle I/O time. Can waste I/O bandwidth if pages aren't used, especially if working set pages are ejected for unused pre-pages. Locality The principle of locality has been discovered from empirical studies of programs. It arrives at a rule of thumb that 90% of processing is spent in 10% of code. This can be taken advantage of using two properties of locality Temporal: recently accessed items are likely to be reused in the future Spatial: 'items whose addresses are close to one another are likely to be used closely in time. Working Set A working is the subset of data and code that a process uses in a window of time x. A working set is an approximation of a processes locality, too small it won't cover everything, too large and it will encompass several localities. A system should always attempt to cover at a processes working set to run efficiently. A processes working set tends to change gradually with time. Thrashing CPU & memory usuage tends to increase with the degree of multitasking. Thrashing occurs when a process or processes working set no longer fits in memory and must be continually swapped in and out of disk. This results in the system spending more time swapping to and from disk rather than in computation. To recover form thrashing a system must suspend a number of processes until more memory has become available. Page Replacement Algorithms When memory is scarce, pages can be kicked out of memory and stored on disk to free up space. This section discusses ways to choose the next page to evict. Optimal whatsoever ''The best page to kick out is the one that won't be accessed for the longest time. '''Disadvantage: Impossible to implement. Obviously this is impossible to work out in practice (short of pre-running the program to find out what it accesses. Apparently MS Vista actually tries crap like this), and as such this is useful only as a theoretical benchmark and pie-in-the-sky. First-In,First-Out (FIFO) Kick out the page that has been hanging around in memory the longest. * Disadvantage: Will remove heavily-used pages simply because they've been around a long time. * Disadvantage: 'Belady's Anomaly' - more physical frames do not imply fewer page faults. * Advantage: Easy to implement. Least Recently Used (LRU) Kick the page that hasn't been referenced for the longest time. Works on the principle that a page that hasn't been accessed in a while is unlikely to be accessed anytime soon. This works (because of the Principle of Locality). * Advantage: Picks really good pages to swap out. * Disadvantage: Requires a timestamp to be updated on every single page reference. * Disadvantage: Requires an exhaustive search of all pages' timestamps to choose a page. * Disadvantage: Because of the above two, basically impossible to implement efficiently. Clock/Second-Chance Kick the next page on the list, unless it's been referenced recently in which case you get a second chance. This is basically the 'good compromise' algorithm that sane people use. If you are asked what is a suitable algorithm, say this one. Uses a per-page 'reference' bit, either in the frame table or the page table. Every time a page is accessed, this bit is set. Keeps track of a pointer to the last page that was swapped. When a new page is needed, cycle through the pages from this point. If a page has a nonzero reference bit, zero it and keep looking. If a page has a zero reference bit, evict it. * Disadvantage: Not as good at choosing as LRU or optimal * Disadvantage: Slower and harder to implement than FIFO * Advantage: Faster and easier to implement than LRU * Advantage: Better at choosing than FIFO Resident Set Size How many frames should a process be allocated? Fixed Allocation Each process is given a fixed number of frames in which to operate. If a page fault occurs it is replaced with in the fixed allocation given to the process. Fixed allocation struggles to achieve high utilisation as some processes will need more than given and some will use less. Variable Allocation Global: '''Process keeps global list of free frames and gives process free frame when requested. When it stops using a frame it is returned to the free-list. If there are no free frames page can be taken from any of the processes. '''Local: Give process a number of frames based upon properties of the process. Pages are replaced with in this subset. The frame allocation is reviewed periodically. Frequency: Attempts to achieve acceptable number of page faults. If process is not faulting frames are taken away, if faulting too much given more frames. Category:COMP3231 Category:Operating Systems